Highly Degraded Recto-verso Document Image Processing and Understanding

نویسندگان

  • Emanuele Salerno
  • Anna Tonazzini
چکیده

can be interpreted as matching of an input graph (keyword) with a large set of graphs (document). More formally, in order to spot a certain keyword wi, all p graph instances gi1, ... , gip of that word wi occurring in the training set are matched against all graph words in each text line using our adapted graph matching procedure. That is, for a given word wi and a specific text line s pairwise distances between all prototypical graphs gi1, ...,gip and the m word graphs g'1, ... ,g'm from text line s are obtained first. The minimum of these graph distances serves as a distance function d(wi,s) of the keyword’s word class wi to the text line s. If the distance d(wi,s) of a keyword to the text line is below a given threshold, the text line s and the word from s having the minimum distance is returned as a positive match to the keyword wi.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reflectance and transmittance model for recto-verso halftone prints.

We propose a spectral prediction model for predicting the reflectance and transmittance of recto-verso halftone prints. A recto-verso halftone print is modeled as a diffusing substrate surrounded by two inked interfaces in contact with air (or with another medium). The interaction of light with the print comprises three components: (a) the attenuation of the incident light penetrating the print...

متن کامل

Reduction of Bleed-through in Scanned Manuscript Documents

Many old manuscript documents were written on both sides of the paper, and the bleed-through from one side of the document to the other increases the difficulty in reading or deciphering the information on the page. This paper presents techniques for reducing such bleed-through distortion using techniques of digital image processing. Both sides of the document are scanned, maintaining full spat...

متن کامل

Restoration of recto-verso colour documents using correlated component analysis

In this article, we consider the problem of removing see-through interferences from pairs of recto–verso documents acquired either in grayscale or RGB modality. The see-through effect is a typical degradation of historical and archival documents or manuscripts, and is caused by transparency or seeping of ink from the reverse side of the page. We formulate the problem as one of separating two in...

متن کامل

A Ground Truth Bleed-Through Document Image Database

This paper introduces a new database of 25 recto/verso image pairs from documents suffering from bleed-through degradation, together with manually created foreground text masks. The structure and creation of the database is described, and three bleed-through restoration methods are compared in two ways; visually, and quantitatively using the ground truth masks.

متن کامل

Séparation recto/verso d’un document par modélisation markovienne à double couche

Nous proposons un modèle markovien à deux couches pour la séparation des deux faces d’un document, dont une seule face a été numérisé. A l’aide de deux champs de Markov séparés, un pour chaque face, chaque pixel est modélisé par deux variables cachées connectées par une unique variable observée. L’avantage de cette formulation est une meilleure adaptation au processus ayant créé l’image observé...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • ERCIM News

دوره 2013  شماره 

صفحات  -

تاریخ انتشار 2013